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Abstract 

The three-input TDFFDLI gate is the workhorse of circuit synthesis for classical logic 
operations on quantum data, e.g., reversible arithmetic circuits. In physical implementa- 
tions, however, TOFFOLI gates are decomposed into six CNOT gates and several one-qubit 
gates. Though this decomposition has been known for at least 10 years, we provide here 
the first demonstration of its CNOT-optimality. 

We study three-qubit circuits which contain less than six CNOT gates and implement 
a block-diagonal operator, then show that they implicitly describe the cosine-sine decom- 
position of a related operator. Leveraging the canonicity of such decompositions to limit 
one-qubit gates appearing in respective circuits, we prove that the «-qubit analogue of the 
TOFFOLI requires at least 2n CNOT gates. Additionally, our results offer a complete clas- 
sification of three-qubit diagonal operators by their CNOT-cost, which holds even if ancilla 
qubits are available. 
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1 Introduction 



The three-qubit TOFFOLI gate appears in key quantum logic circuits, such as those for 
modular exponentiation. However, in physical implementations it must be decomposed 
into one- and two-qubit gates. Figure Q] reproduces the textbook circuit from lfl4l with six 
CNOT gates, as well as Hadamard (H), T = exp (/7ia z /8) and gates. 
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Figure 1: Decomposing the TOFFOLI gate into one-qubit and six CNOT gates. 



The pursuit of efficient circuits for standard gates has a long and rich history. DiVin- 
cenzo and Smolin found numerical evidence [4] that five two-qubit gates are necessary and 
sufficient to implement the TOFFOLI. Margolus showed that a phase-modified TOFFOLI 
gate admits a three-CNOT implementation (6JI21, whose optimality was eventually demon- 
strated by Song and Klappenecker [20]. Unfortunately, this MARGOLUS gate can replace 
TOFFOLI only in rare cases. The detailed case analysis used in the optimality proof from 
ll20l does not extend easily to circuits with four or five CNOTs. The omnibus Barenco et 
al. paper offers circuits for many standard gates, including an eight-CNOT circuit for the 
TOFFOLI 1 1 , Corollary 6.2], as well as a six-CNOT circuit for the controlled-controlled- 
G z , which differs from the TOFFOLI only by one-qubit operators fl] Section 7]. Problem 
4.4b of the textbook by Nielsen and Chuang asks whether the circuit of Figure Q] could be 
improved. The problem was marked as unsolved, and we report the following progress. 

Theorem 1 A circuit consisting of CNOT gates and one-qubit gates which implements the 
n-qubit TOFFOLI gate without ancillae requires at least 2n CNOT gates. For n = 3, this 
bound holds even when ancillae are permitted, and is achieved by the circuit of Figured 

Our main tool is the Cartan decomposition in its "KAK" form, which provides a Lie- 
theoretic generalization of the singular- value decomposition JH. Several special cases 
have previously proven useful for the synthesis and analysis of quantum circuits, notably 
the two-qubit magic decomposition ifTUl ITTl l24l l23l l22l [T61 [171 . the cosine-sine decom- 
position (71 El [T3l [HI, and the demultiplexing decomposition |[T8l . The canonicity of 
the two-qubit canonical decomposition was used previously to perform CNOT-counting for 
two-qubit operators ifToll . The magic decomposition is a two-qubit phenomenon^ but the 
cosine-sine and demultiplexing decompositions hold for «-qubit operators and enjoy sim- 
ilar canonicity. Moreover, the components of these decompositions are multiplexors lPT8ll 
— block-diagonal operators that commute with many common circuit elements. Commu- 
tation properties facilitate circuit restructuring that can dramatically reduce the number of 

1 While the Cartan decomposition SU(n) = SO(n) • [diagonals] • SO(n) is general, the utility of the magic 
decomposition arises from the isomorphism SU(2) x SU(2) ~ SO(4) being represented as an inner automorphism 
of SU(4). Such coincidental isomorphisms are few and confined to low dimensions. 



3 



circuit topologies to be considered in proofs. These results and observations allow us to 
perform CNOT-counting using the Cartan decomposition in a divide-and-conquer manner. 

In the remaining part of this paper, we first review basic properties of quantum gates 
in Section [2] and make several elementary simplifications to reduce the complexity of the 
subsequent case analysis. In particular, we pass from the CNOT and TOFFOLI gates to 
the symmetric, diagonal CZ and CCZ gates, and recall circuit decompositions which yield 
operators commuting with Z and CZ gates. We also define qubit-local CZ-costs, and observe 
that the total CZ-cost can be lower-bounded by half the sum of the local CZ counts for each 
qubit. Though weak, this bound suffices for our purposes and we can compute it in simple 
cases. Further technique is developped in Section [3l where we compute matrix entries to 
derive constraints on gates from circuit equations. This approach was employed by Song 
and Klappenecker in the two-qubit case, and we generalize several of their results to n- 
qubit circuits. 

Section H] is the heart of the present work, in which we prove our result on the CNOT- 
cost of the TOFFOLI gate. It starts by motivating and outlining the methods involved, 
previews key intermediate results, and proves that the CNOT-cost of the TOFFOLI is 6, based 
on these results. In Section 14.21 we use the canonicity of the cosine-sine decomposition 
derive circuit constraints. Section I4TT1 motivated by 1(171 . employs the canonicity of the 
demultiplexing decomposition, captured by a spectral invariant, to lower-bound CZ gates 
required in circuit implementations of operators. The results apply, mutatis mutandis, to 
CNOT-based implementations as well. Finally, in Section 1431 we deduce as corollaries that 
the three-qubit PERES gate requires exactly 5 CNOTs and the «-qubit TOFFOLI gate requires 
at least 2n. In Section [51 we extend our techniques to all three-qubit diagonal operators, 
completely classifying them according to CZ-cost. Generalizations to circuits with ancillae 
are obtained in Section [6] Concluding discussion can be found in Section [7] 

2 Preliminaries 

We review notation and properties of useful quantum gates, then characterize operators 
that commute with Pauli-Z gates on multiple qubits. We then review circuit decomposi- 
tions from (3l [13] [TH. Finally, we introduce terminology appropriate for quantifying gate 
costs of unitary operators in terms of the CNOT and CZ and state elementary but useful 
observations about these costs. 

2.1 Notation and properties of standard quantum gates 

We write X,Y,Z for the Pauli operators, and CX,CCX for CNOT, TOFFOLI. Rotation gates 
exp(/Z0) are denoted by R z (d), and we analogously use H We work throughout on 
some fixed number of qubits N. For a one-qubit gate g and a qubit q, we denote by g^ 
the A^-qubit operator implemented by applying the gate g on qubit q. Similarly, C^XW 
is the operator implemented by a controlled-X with the control on qubit i and target on 
qubit j. The controlled-Z being symmetric with respect to exchanging qubits, we do not 
distinguish control from target in the notation CZ^' ] >. We similarly denote the operator of 
a controlled-controlled-Z on qubits i,j,k by CCZ^' k K In choosing qubit labels, we follow 

2 We omit the factor of ±1/2 used by other authors. 
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CNOT and TOFFOLI 


CZ and CCZ 


Advantages 


With one-qubit gates added, either CNOT or CZ would be universal 


Implement addition and multiplication 
Universal for reversible computation 
Block-diagonal 

With 1-qubit diagonals, implement any diagonal 
Commute with X on target 


Symmetric 

Fewer circuit topologies 
Diagonal 

Commute with Z on target 


Other 
properties 


Change direction after two H-conjugations 




One can map back and forth by H-conjugation on target 


Applications 


Circuit synthesis 


Circuit analysis 



Table 1: Relative advantages of standard controlled gates. 



throughout the convention that the high-to-low significance order of qubits is the same as 
the lexicographic order of their labels. 

We follow the standard but sometimes confusing convention that typeset operators act 
on vectors from the left, but circuit diagrams process inputs from the right. Consistently 
with the established notation for the CNOT gate, we denote the X gate by "©" in circuit 
diagrams. We denote the Z gate by a "•" symbol, which does not lead to ambiguity in the 
matching notation for CZ because CZ is symmetric. Thus the following diagram expresses 
the identity CZ^ ,m) X w = Z^X^CZ^'™) and rearranges gates in quantum circuits, like de 
Morgan's law does in digital logic. 



/// 



(1) 



Another standard identity relates the X, Z, and one-qubit HADAMARD (H) gates: HXH = Z. 
By case analysis on control qubits, one obtains the further identities H^C^.X'WhM = 
and hWcC^xWhW = CCZ^). Despite this equivalence, we prefer the X family of gates 
for some applications and the Z family for others, as summarized in Table [TJ 

Circuits consisting entirely of one-qubit gates and CZ (respectively CNOT) gates will be 
called CZ-circuits (respectively CNOT-circuits). Using the above identities, CZ-circuits and 
CNOT-circuits can be interchanged at the cost of adding one-qubit H gates. It will also be 
convenient to consider CZ^-circuits, which by definition are arbitrary circuits where all 
multi-qubit gates touching qubit I are CZ. While these are not a subclass of CZ-circuits, a 
CZ^ -circuit can be converted into a CZ-circuit without any changes affecting qubit £. 



2.2 Operators commuting with Z 

We now recall terminology for operators commuting with Z on some qubits, but possibly 
not all qubits. Further background on the circuit theory of these quantum multiplexors can 
be found in lfl8l . 

The control-on-box notation of the following diagram indicates that the operator U 
commutes with Z™\ The backslash on the bottom line indicates an arbitrary number of 
qubits (a multi-qubit bus). 
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These operators include the commonly-used positively and negatively controlled-?/ gates, 
although in our notation U also acts on the control qubits (and is thus "larger than the box 
in which it is contained"). In general, operators which commute with Z are block-diagonal: 

Observation 2 For a unitary operator Q and qubit i, consider the one-qubit values |0)^ 
and on l-th input and output qubits of the operator. The following are equivalent. 

• Q commutes with Z^' 
. <0|Wfi|l)W=0 

. <l|Wfi|0)^ = 

• Q admits a decomposition Q = |0) (0| <8> <2o + 11) (1| <8> Qh where the projectors \i) (i\ 
operate on qubit £ and the unitary <2, operate on the qubits other than I. 

In an appropriate basis, the matrix of Q is block-diagonal. Its blocks represent the "then" 
and "else" branches of the quantum multiplexor Q with select qubit I. 

Notation. If Q commutes with zW and £ is clear from context, we denote Q's diagonal 
blocks (j\ Q\j) by Qj. Similarly, if Q commutes with with Z^ on multiple qubits 
ti ... 4, then for any bitstring j\ . . . j k we write Q h ... jk for (ji ...j k \ { - t - Ik] Q\j\... j k ) {e, " A) ■ 



When the l k include all the qubits, Q is diagonal and the Qj l ...j k are its diagonal en- 
tries. In general, Qj 1 ..j k capture diagonal blocks of Q with respect to an ordering of 
computational-basis vectors in which qubits l\ ... £k are the most significant qubits. 

We now point out the following commutability. 

Observation 3 Let Q,R be two gates such that for every qubit I, either one of them does 
not affect i, or both of them commute with Z^'\ Then QR = RQ. In picture: 



Q 



V 



R 



Q — 



R 



We now recall the multiplexed rotation gates [13, 18], which generalize the R x ,R y ,R z 
gates. Let A be a diagonal Hermitian matrix acting on the qubits and fix another 

qubit m ^ l{. We define the operator R^ (A) on the qubits l\ , . . . ,£k,m by the conditions (1) 
that it commute with Z^'' for all i, and (2) for any bitstring j\ ... j k , we have R^ {A)j\...j k = 
R z (A ei ,. Ik ). Explicitly, R^iA) = exp(/zMA^ - ^). Multiplexed R x ,R y gates are defined 
similarly. Since such operators commute with Z^J, we depict them in circuit diagrams 
with the appropriate control-on-boxes. 

It is natural to ask when an operator commuting with various Z gates can be imple- 
mented in a CZ-circuit containing only gates commuting with the same Z gates. The answer 
is given in terms of the partial determinant. 
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Definition 4 Fix qubits l\...lk- We define the partial determinant map det^. from 
the operators commuting with Z^ 1 ', . . . ,Z^ k ' to the diagonal operators acting only on the 
qubits £,. It is given by (dete l ___£ k (U))j l „j k = det(Uj { „j k ). 

When computing partial determinants of a single gate or subcircuit acting on m qubits, 
we first tensor respective operators with I 2 N-m to form operators acting on all N qubits 
(which may affect the determinants). When applied to such "full" operators, the partial 
determinant mapping is a group homomorphism. 

Proposition 5 Fix qubits among N > k qubits. A unitary U commuting with 

Z^ 1 ', . . . , Z") can be implemented by a CZ-circuit in which only diagonal gates operate on 
qubits lj if and only j/det^.. j k (U) is separable (can be implemented by one-qubit gates). 

Proof. (=>). It suffices to show the separability of defy, ...i k (U) for a generating set of 
operators. By definition, such a generating set is provided by CZs, one-qubit diagonals on 
the lj, and gates not affecting any of the l{. 

Note first that any diagonal gate D acting on qubits t\, . . . ,£t has partial determinant 
given by det^ ...t k (D) = D 2 , understood as an operator on qubits £\ ... l^. In particular, if 
D were separable, then so is det^ ...e k (D). If D = CZ^i', then from CZ 2 = / and N > kwe 
deduce det^ ^(CZ^"^) = /. The remaining gates we need to consider are: 

(i) any gate not affecting qubits £, implements U = g( 1JV )\(^i - ^) for some Q. 
In this case V ' j x ...j k = Q, and furthermore det£ l ...e k (U) = det(<2)/. 

(ii) CZ gates connecting qubits l u m £{£i,.. .,4}. Wecompute det^...4(CZ^' m) ) = (Z^) 2 " 
(<=). This part of the result is not used in the rest of the paper, and we therefore defer 

the proof to the Appendix. ■ 

2.3 Cartan decompositions in quantum logic 

This section recalls two important operator decompositions (cosine-sine and demultiplex- 
ing) and casts them as circuit decompositions. Readers willing to accept their use in our 
proofs may skip to Section [2~4l 

Observe that an operator can be implemented with a single one-qubit gate if and only if 
it commutes with the Pauli operators Z and X on all other qubits. Thus to produce a CNOT- 
circuit for a given operator U, one may use the following algorithmic framework. 

1 . Decompose U into a circuit in which each non-CNOT gate, V, W, . . . , commutes with 
X and Z on more qubits that U does. 

2. Apply the algorithm recursively to V, W, . . . until one-qubit gates are reached. 

As Z is self-adjoint, the requirement that U commutes with Z^ can be rephrased as the 
condition that U is fixed under the involution U h- > Z^'t/zW. Given such an involution, a 
fundamental Lie-theoretic result produces an operator decomposition ||8l. Here we recite 
the result for completeness, but do not require the reader to understand all terminology. 

The Cartan Decomposition. Let G be a reductive Lie group, and l : G — > G an involution. 
Let K = {g : i(g) = g} and A be maximal over subgroups contained in {g : i(g) = g -1 }. 
Then K is reductive, A is abelian, and G = KAK. 
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In order to restate decompositions of unitary operators as circuit decompositions, we 
employ the notation of set-valued quantum gates [18 ]. Completely unlabelled gates (as in 
Equation [4]) denote the set of all gates satisfying all control-on-box commutativity condi- 
tions imposed by the diagram, and gates labelled R x ,R y ,R z denote the appropriate set of 
(possibly multiplexed) rotations. An equivalence of circuits with set-valued gates means 
that if we pick an element from each set on one side, there is a way to choose elements on 
the other so that the two circuits compute the same operator. The backslashed wires which 
usually indicate multiple qubits may also carry zero qubits. 

The involution fa :U i— > Z^UZ^ corresponds to the cosine-sine decomposition^ 



g- 



Vf 



t —I I— 



0- 



(2) 



The involution fa :U i— > Y^UY^ yields the demultiplexing decomposition [ 18]. 




(3) 



a- 



2T 



The map fa restricts to the subgroup of diagonal operators. This group being abelian, 
the K and A factors commute, leaving the following decomposition of diagonal operators. 



R- — 



(4) 



The involution fa further restricts to the subgroup of multiplexed Z rotations, which 
we can demultiplex again. The K and A factors again commute; the A factor is computed 
by the last 3 gates in the circuit below. 



i 




R^ 



Si 



R^ 



-9- 



1 



Rz -0 



(5) 



To establish the existence of these decompositions, it remains to verify in each case 
that the purported K and A satisfy the appropriate properties with respect to the relevant 
involution. This can be checked after passing to the Lie algebra where it is easy. Alter- 
natively, explicit constructions of the cosine-sine and demultiplexing decompositions are 
given in IPT51 and |[T8l , respectively. 

To decompose general «-qubit operators, Equation |2]can be applied iteratively until all 
remaining gates are either multiplexed R y gates or diagonal. The R y gates can be replaced 
by R z gates at the cost of introducing some one-qubit operators; the R z and other diago- 
nal gates can be decomposed as described above; for details and optimizations see lfT3l . 
Smaller circuits are obtained by another algorithm, which alternates cosine-sine decompo- 
sitions with demultiplexing decompositions; for details and optimizations, see |[T8l . 



3 The terminology comes from the numerical linear algebra literature; see lfT5l and references therein. 
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When circuit decompositions are applied recursively, some gates can be reduced by 
local circuit transformations. For example, when iteratively demultiplexing multiplexed R z 
gates, some CNDTs may be cancelled as shown below. 
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This technique produces a circuit with 2" CNOT gates for an n-ply multiplexed R z gate. 
Using Equation HI we obtain a circuit with 2" — 2 CNDT gates for an arbitrary «-qubit 
diagonal operator [3 ]. Applying this result to CCZ gate leads to the circuit in Figure Q] 



2.4 Basic facts about CZ-counting 

The CZ-cost \U\cz of an A-qubit operator U is the minimum number of CZs which ap- 
pear in any A-qubit CZ-circuit for U ; we define the CNOT-cost analogously. The identity 
hWcCOxWhW = CZ&fi ensures that \U\cz = \U\csai- The further identity H^CC^xWhW = 
CCZ&iM yields: 

Observation 6 |CCZ| CZ = |CCX| CN ot < 6. 

By way of illustration, the following modification of the circuit in Figure [Qimplements 
the CCZ in terms of CZs. 



TH 



HT^H 



H 



H 



HT^H 



HTH 



HT^H 



HTH - H 



(6) 

It shall prove more convenient to compute |CCZ|cz rather than |CCZ|cnot- To do so, we 
are going to study the number of CZs which must touch a given qubit in any CZ-circuit 
for a given operator. More precisely, the CZ^-cost |t/|cz;^ is the minimum number of CZ 
gates incident on £ in any CZ^' -circuit for U. These cost functions are related through the 
following estimated 

Observation 7 For any operator P, 



cz 



> 



cz-j 



4 This bound is very weak in general. Dimension-counting shows that a generic A^-qubit operator U requires 
on the order of 4 N CZ gates [9|, whereas the results of [ 18 1 imply that \U\ C z-j < 6N. At best we can establish that 

\U\cz>N(6N-l). 
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Proof. Each CZ gate touches two qubits. ■ 

As the costs |CCZ|cz;y are the same for j = 1,2,3 (by symmetry), 

|CCZ| CZ > ||CCZ| CZ ;,' (7) 
2 

We emphasize that the number of qubits, N, is an unspecified parameter in both | • | C z 
and | • \cz-,e- In the presence of ancillae, we define \U\^ Z '■= min, \U <8>/f f |cz- Obviously 
l^lcz ^ \U\cz- While \U\q Z = \U\cz seems unlikely to always hold, we are not aware of 
any counterexamples. Indeed, we will show in Section [6] that this equality holds for all 
two-qubit operators and all three-qubit diagonal operators. 



3 Deriving gate constraints from circuit equations 



The circuit decompositions of Section 1231 are essentially unique, and from this canonicity 
one can derive various constraints on which gates may appear in certain circuit equations. 
We will pursue this route in Section 14.21 However, the simplest cases are easier to treat 
from the more elementary point of view adopted by Song and Klappenecker in their clas- 
sification of two-qubit controlled-?/ operators by CNOT-cost |fl9l . Considering the operator 
computed by a candidate circuit, they first focus on matrix elements which vanish if the 
operator is a controlled-?/. In order to produce such zero elements, the gates in the can- 
didate circuit must satisfy certain constraints. Below we derive a series of more general 
results for «-qubit circuits. One-qubit gates which become diagonal when multiplied by X 
occur frequently; we refer to them as anti-diagonal. 

Lemma 8 The following equation imposes at least one of the following constraints. 



1 — | 




1. a,b are both diagonal or both anti-diagonal. 

2. P takes the form d ® Pofor some one-qubit diagonal d. 

Proof. 

0= (0| (1) a/^|l) (1) = (0|a|0) (0|fe|l)fi)+(0|a|l>(l|&|l)Pi 

As the coefficients do not vanish, P and P\ are linearly dependent. It follows that P = 
d®Po for some one-qubit diagonal d. ■ 

Corollary 9 Ifa^CZ^'^b^ commutes with then a, b are both diagonal or anti-diagonal. 

Corollary 10 In the situation ofLemma\S\ there exist one-qubit operators a' ,b' which are 
either diagonal or anti-diagonal, such that a'WPZA 1 ) = Q. 

Proof. Apply LemmaHl we need consider only Case 2. Take a' = a8b8~ l and b' = /; then 
a'Wpb'W = aWPbM. As a'^ = QP^ commutes with Z^, it is diagonal. ■ 

We turn now to circuits with two CZ gates. 
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Lemma 11 Suppose the following equation holds. 




Then (I) afij is diagonal for all i, j or (II) one of P, X^P commutes with Z^ 2 \ 
Proof. We compute: 

= {0|«(i|( 2 )aPfc|l)W|j}^ = (0|«a^|l)«( i f 2 )p|7}( 2 ) 
Either (i\ {2) P \j} {2) = for some i,j, or {0\a t bj |1) vanishes for all ■ 



Corollary 12 Suppose the following equation holds. 



M 




T 


— • — 


S 




R 



Then either (I) an even number ofr,s,t are anti-diagonal, and the remainder diagonal, or 
(II) S or SX( 2 ) commutes with Z^ 2 \ 

Proof. In order to apply Lemma ITT1 We move 7? and T to the other side. 



in 



M 



R [ 



V 



"0- 



The cases here will correspond to the cases of Lemma [TT] Case II is preserved verbatim. 
For Case I, the "aibj" which must be diagonal are rst,rsZt,rZst,rZsZt. Since (rst)^rsZt = 
tZt* is diagonal, we deduce that either t or ?X is diagonal. Likewise, rZst(rst)^ = rZr* is 
diagonal, so either r or rX is diagonal. Finally, rst is diagonal, so from what we know about 
r,t, either s or sX is diagonal, and the number of r,s,t which are not diagonal is even. ■ 

The following reformulation will be useful later. 

Corollary 13 Suppose Q commutes with ZW and let ^ be a QZ^ -circuit computing Q in 
which exactly two CZs are incident on i, say CZ^' m ) and CZ^>"). Then all non-diagonal 
one-qubit gates may be eliminated from qubit i at the cost of possibly (i) replacing CZ^'") 
with CZ^' m ' and (ii) adding one-qubit gates on qubits m,n. 

Proof. By hypothesis, ^ takes the form 

Q = [r ® R] CZ^ [s ® S] CZ^ [t <g> T] 

where r,s,t are subcircuits of one-qubit operators acting on £, and R,S,T are subcircuits 
containing no gates acting on £. We immediately replace r,s,t by the one-qubit operators 
they compute. Moreover, if m ^ n, then replace S and T by S ■ SWAp( m ' M ) and SWAp( m "' • T, 
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where SWAP is the gate which exchanges qubits. The swaps will be restored and canceled 
at the end of the proof. We are in the situation of Lemma [TT] 

Case I. We are done, with the exception that the r,s,t may be anti-diagonal rather than 
diagonal. In this case, Equation Q] allows the extraneous Xs to be pushed through and 
cancelled at the cost of introducing Z gates on qubit m. The diagonal gates remaining 
on qubit £ may be commuted through the CZs and conglomerated into one. Finally, the 
possible swap introduced between the S, T terms may be cancelled. 

Case II. Using Equation Q] and replacing s by sZ if necessary, we commute S past one of 
the CZs. We now have: 

Q=[r®R}QZ { - l ^s^QZ { - Lm \t®ST] 

Rearranging the equation, 



(8) 



Let V be the value of either side of the equation above. Then from the LHS we see that V 
commutes with and from the RHS we see that V is a two-qubit operator commuting 
with z( m \ Thus V is a two-qubit diagonal, and admits the following decomposition. 



V 



R,(a) 



m — | I— — R z (fi) — H 



H — R Z (Y) — H 



H — 



Substituting this decomposition for the RHS of Equation [8] and restoring the R,S,T gates 
completes the proof. ■ 



4 The CNOT-cost of the TOFFOLI gate 



So far we have reduced CNOT-counting for the TOFFOLI gate to CZ-counting for the CCZ 
gate, with the latter two being diagonal and symmetric. Having derived the inequality 
3[CCZ|cz;c/2 < |CCZ|cz, we seek to determine the qubit-local costs [CCZ| C z ; £- 

The idea is to find an equivalence relation ~^ such that (i) U ~£ V \U \cz-,£ = \V \cz-,e 
and (ii) the equivalence classes of ~f are easy to characterize. 



Definition 14 For P, Q commuting with Z^\ 
fying the following equation. 



we write P ~^ Q if there exist a,b,A,B satis- 





(9) 



The fact that | • \ CZ ;i is constant on equivalence classes is obvious; the ability to char- 
acterize the equivalence classes comes from a comparison between Equation [9] and the 
demultiplexing decomposition of Equation [3] We construct invariants of the equivalence 
classes in Theorem [FT] The reductions of Section l4~2l provide circuit forms on which the 
invariants are easy to compute; as a consequence, we arrive at a complete characterization 
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of U such that |£/|cz;^ = 0, 1,2 in Theorem [T8l The CCZ gate falls into none of these classes, 
and thus [CCZ[ C z ; / > 3, and hence |CCZ| C z > 5. Unfortunately, qubit-local CZ-counting can 
take us no further: one can show by construction that in fact |CCZ| C z;£ = 3. 

We now consider a hypothetical five-CZ circuit for the CCZ and seek a contradiction, 
using a divide-and-conquer strategy. There are many possible arrangements of the CZs, and 
we do not deal with them case by case. Nonetheless, we fix one here for clarity. 



-/ 



-{°y 



We define a,b,P, Q as follows. 





Our circuit decomposition now takes the following form. 



(10) 



h — 




(11) 



<2 



Up to some two-qubit diagonal fudge factors, this equation says that the cosine-sine de- 
composition of <g> / is [a ® In Section |4~2l we translate the well-known canonicity 
of this Cartan decomposition into constraints on the components a, b, P and Q. The formu- 
lae of Theorem [18] further strengthen these constraints in the | • \cz-,e = 3 case. Specifically, 
we show in Theorem |22]that if \U\cz-,e = 3 and & computes U using the minimum required 
three CZ gates incident on £, then all one-qubit gates on £ are diagonal or anti-diagonal. 
The anti-diagonal gates can be made diagonal at the cost of introducing Z gates elsewhere 
in the circuit. 

This is the last result needed to determine the CZ-cost of the CCZ. From [CCZ| C z;£ > 3, 
we see that in any five-CZ circuit for the CCZ, two of the qubits, m,n touch exactly three CZ 
gates and the remaining one touches four. By Theorem |22l we can assume all one-qubit 
operators on m,n are diagonal. Proposition [5] would then require det„, „ CCZ = CZ( m,n ) to be 
separable, which it is not. 
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Theorem 15 |CCZ| CZ = 6. 

We show in Section[6]that the use of ancillae can not lower the CZ-cost of the CCZ. 

4.1 CZ counting via the demultiplexing decomposition 

We now turn to the study of qubit-local CZ-cost. To apply P ~^ Q =>• |P|cz^ = |<2|cz;^> 
we first seek to determine when P ~ ( Q. This will be done under the assumption that P 
and Q both commute with Z^\ 

Definition 16 Let U commute with Z^\ Then the i-mux-spectrum 3^(t/) is the multi- 
set of eigenvalues, taken with multiplicity, of uJUo. Two multi-sets S,T are said to be 
congruent, S = T, if there exists a nonzero scalar X such that either XS = T or XS = T*. 

We note that before taking the £-mux-spectrum of U, it is necessary to fix the number 
of qubits on which U acts : 3^(£/ <8>/) contains dim/ copies of 3^(t/)- 

Theorem 17 Suppose P,Q commute with Z^\ Then P~ ( Q <=> 3 W (P) = 3 w (g). 

Proof. (=>)■ As P Q, there are gates a,b,A,B such that 





By Corollary |9l we may assume that either a,b or aX,bX are diagonal. In the first case, 
Qo = aoboAPaB and Q\ = a\b\APiB. Thus q\q = (aib^ a bQB^P\PoB, which has the 
same eigenvalues as (ai&i) 1 'aoVf^o- Thus 3 W (P) = 3 W (2)- 

Otherwise, a 1 = aX and b' = Xb are diagonal. Now Q^Qi = (a\b\ ^a^b'^P^PiB, which 
has the same eigenvalues as (a'^y a^b^P^Pi, whose eigenvalues in turn are the complex 
conjugates of those of a\b\ (a' b' o y Pj P ; again 3 (t) (P) ^ 3 w (Q). 

O). By supposition, the 3 (f) (P) ^ 3 {£) (Q) We note 3 (f) (X (f) PxW) = 3(P) j and 
3((R-z (X)P) = e 2,l( 3(P). Therefore we can readily find an operator P' such that the 
£-mux-spectrum of P is identical, rather than merely congruent, to that of Q. It remains to 
show that P' ~e Q. 

By the demultiplexing decomposition (Equation [3]) there exist unitary operators Mp,Np 
and a real diagonal matrix 8p, all of which operate on the qubits other than I, such that 
P , = [I®Mp]Rf > (5 P )[I®Np]. Likewise wedecompose Q= [KS>M Q }Ri e \d Q )[I(g)NQ]. If we 
let Ap = exp(/5p) and Aq = exp(/5g), then the £-mux-spectra of P' and Q are respectively 
the entries of Ap and Aq. Since 3^(P) = 3^(2), there must exist a permutation matrix 
K acting on the qubits other than i such that nApitf = Aq. Rearranging, we have AqKAp = 
AqkAI. Writing K for this term, [I®M q KM p )P'[I®nIx*Nq) = Q. Thus P' ~t Q. ■ 

We now apply Theorem [TTl to prove the following result relating 3^(P) and |P|cz;^- 
We emphasize that the number of qubits on which P acts is an unspecified parameter in 
both of these functions. 

Theorem 18 Let P commute with Z^'. 
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• \P\cz-e = 0iff3^(P)^{\,\,...}. 

• |P| C ^ = 1#3W(P) = {1,-1,1,-1,...} 

• |-P|cz;£ < 2 iff3^(P) is congruent to some multi-set S of unit norm complex numbers 
which come in conjugate pairs. 

Proof. The first and second statements follow immediately from Theorem [FT] and the 
calculations 3 w (/) = { 1 , 1 , . . . } and 3 {e) (CZ^ ) = {1,-1,1,-1,...}. To perform the 
relevant calculation for the third statement, we will use Corollary [13] 

Let I be the most significant qubit. For 8 a diagonal real operator acting on all qubits 
but I, define <J>(<5) by 



-Q- 



*(*) 



Ry(S) 



By construction, |<J>(5) \ C z-,e < 2. We compute 3 W (4>(5)) = {e 2iS ° , e~ 2 ''* , e 2iSl , e - liSl ,...,}. 

(^=) Write the entries of 5 as e^ ■ {e ie °,e~ ie °,e iei ,e~' 9i ,...}, and let 6 be the real diag- 
onal operator acting on all qubits but I whose diagonal entries are 6q, 8\, By construc- 
tion, 3 w (<£(0/2)) = S, and S ^ 3 W (2) by hypothesis. By Theorem [13 <J>(0/2) ~ e Q 
are ^-equivalent. It follows that \Q\cz-t = |*(0/2)|cz^ < 2. 

(=>■) By hypothesis [<2|cz;f < 2. If in fact |<2|cz;£ = 0, 1, note by the first two statements 
of the Theorem, which have been proven, the £-mux-spectrum of Q has the desired prop- 
erty. Thus we assume |<2|cz;£ = 2. Let ^ be a circuit in which this minimal CZ count is 
achieved. By Corollary \\3\ we can find an equivalent circuit 'tf' of the following form. 



Q 

















A 








B 






c — 






* 




\— 







We have drawn the CZs with different lower contacts, but of course they might be the 
same. Actually, we prefer the latter case, and ensure it by incorporating swaps into B,C if 
necessary. We take a cosine-sine decomposition (see Equation [2]) of B 



V4 



Rz(0) 



Q 



V 



D- 



Ry(P) 



D- 



Br 



C 



Note that the B L and B R gates commute with the CZs. Thus Q ~ e <£(/?). By Theorem [T71 
the 3 W (2) = 3 w (<£(/3)). But we have already seen that 3 W (<£(-)) always consists of 
conjugate pairs of unit-norm complex numbers. ■ 
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4.2 Circuit constraints from the cosine-sine decomposition 



This section is devoted to the study of Equation QT] We take cosine-sine decompositions of 
a,b. Below, Ai,A r ,Bi,B r are two-qubit diagonal operators, and a, j8 are 2 x 2 real diagonal 
matrices of angular parameters. 




B, 



-rrt- R y (-p) -rn- 



B, 



2 — 6 — -4 



1 — [a] — — I 1— R y (a) — I — 



5 



(12) 
(13) 



Define P = A L PB R and Q =A' R QB* L to obtain: 



1 -\Ry_ 

2 



-P) 



D- 



V 



R y (a) - 



5 




(14) 



We recall the standard argument used to measure the uniqueness of the KAK decom- 
position HI. Throughout this discussion, we will write simply R y (a) for R y 1 \a^), and 
similarly for R y (P). Rearrange the equation to obtain Q^R y (a)P = R y (P). Transform- 
ing the equation by k ^ Z^tfzW, we get P^R y (a)Q = R y (P). Multiplying these equa- 
tions yields P^R y (2a)P = R y (2f5). Thus R y (2a) and R y (2fi) have the same eigenvalues. 
One can check that in fact they are conjugate under an element of the group W gener- 
ated by X^ 2 ) and CZ^ 1 ' 2 ); note that these operators commute with Z^. That is, there exists 
w € W such that wR y (2a)w^ = R y (2fi). Now let t = wR y (a)w^R y (-fi). We have both 
t =Ry(E ) ) for some 2x2 real diagonal matrix % acting on qubit 2, and t 2 = 7; it follows 
that f e {±7, ±z( 2 )}. Defining P = P ■ [tw ® I] and Q = Q ■ [w ig> I] reduces our equation to 
the following. 



1 

2 



R y (-a) 



-p- 



V 



R y (a) 



(15) 



Q 



By an argument similar to that given for P and Q, the operators P and Q both commute 
with R y (2a). Conjugation by R y (a) is an involution on the set of operators commuting 
with Ry(2a); Equation [T31 says that P and Q are interchanged by this involution. In fact, 
this involution always has a simpler description: 

Lemma 19 Equation \1 51 also holds for some a for which a,- is an integer or half-integer 
multiple of 71. Half-integers occur if and only if2(Xj is an odd integer multiple of%. 



Proof. Decompose 2a,- = 0,- + Yi (mod 2%) where <pj G (—71,71), where \\n = unless 
(j>i = 0, and Yi £ {0, ft} m an Y event. Then any operator which commutes with R y (2a) 
also commutes with R y ((j>/2). Thus, on operators commuting with R y (2a), conjugation 
by R y (a) is the same as conjugation by R y (a — <j)/2) = R y (a — 0/2 — 1^/2)^(^/2). But 
2(a - (j>/2 - y/2) = (mod 2k). ■ 

We also record the constraints imposed on possible P, Q by the value of 8 = 2a. 
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Lemma 20 Fix distinct qubits £,m. Let U be a unitary operator commuting with zW, and 
let 6 be a two-by-two real diagonal matrix of angular parameters which is understood to 
operate on m. Then U commutes with RyP(d) if and only if one of the following holds: 

1. cos(S) is scalar, and either 

(a) sin(S) =0. 

(b) sin(S) is a nonzero scalar and Uq = U\. 

(c) Zsin(S) is a nonzero scalar and Uq = z( m >UiZ( m >. 

2. cos(S) is not scalar, U commutes with 7^ m \ and either 

(a) sin (So) = an d sin(Si) = 0. 

(b) sin (0o) = and sin(Si) / and Uoi = U\i- 

(c) sin (So) 7^ and sin(Si) = and Uoo = t/io- 

(d) sin (So) 7^ and sin (Si) ^ and Uo = U\, 

Proof. The (<£=) direction is trivial. For (=£•), suppose [R^\9^),U} = and expand 
using the expression 7?f } (S (m) ) = exp^'Y^S^) = C os(S)( m ) + iY^ sin(S) (m) in order to 
observe that U and U\ both commute with cos(S) (m) , and f/ sin(S) (m) = sin(S) (m) ?7i. 
Now repeatedly apply the fact that two-by-two matrices which commute with a two-by- 
two diagonal matrix with distinct entries are themselves diagonal. ■ 

Finally, we translate these results back to the original operators P, Q. 

Lemma 21 In the situation of Equation 177] at least one of the following must hold, 

1. Either a,b are diagonal or aX^',bX^' are diagonal. 

2. There exists a two-qubit operator U and two-qubit diagonals D,D' such that 



D' 



U 



D 



Similarly, there exists a two-qubit operator V and two-qubit diagonals C, C' such that 



Q 



C 



V 



c 



3. Either P or PX^ commute with Z^ 2 \ There exist replacements a',b' for a,b which 
are in the subgroup generated by two-qubit diagonal operators on qubits 1 and 2, 
C^xW, and such that Equation 17 i I continues to hold. 



Proof. This amounts to unwinding the above discussion in light of Lemma [20] Case I 
comes from Case La of the Lemma; the X appears because of the 2 in S = 2a. Case II 
comes from Cases Lb and I.e. The first claim in Case III is just Case 2 of the Lemma; 
the possible X here comes from the w factor in P = Ptw from the discussion above. The 
second claim follows from Lemma IT9l ■ 



While we cannot completely characterize operators with | • \cz-i 
ize CzW-minimal circuits which compute them. 



3, we can character- 
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Theorem 22 Fix a qubit £, and suppose M commutes with Z^\ Suppose \M\cz-j = 3, and 
let be a CZ^ -circuit exhibiting this bound. Then all one-qubit gates of ^ on I are 
diagonal or anti-diagonal. 



Proof. Consider M, <?f satisfying the hypothesis. Without loss of generality, i = 1 and ^ 
takes the form 

1-& 



V4 



H 



G 



f 



-a- 



The CZs may have originally had different terminals, but we can incorporate swaps into 
E,F,G,H to suppress this behavior. This affects neither the hypothesis nor the conclusion. 
(*) Define P by 

1 


















G 




F 





If PX® commutes with Z®, then return to (*) and replace G by GX^\ H by X^H, and h 
by Z^h. This does not affect the conclusion, and by Equation [TJ the resulting circuit still 
computes M. We have ensured that if one of P,PX^ commutes with Z^ 2 \ then it is P. 
Define a,b, Q by 




/ 



— h 



I 



Note |Q|cz;i = |Af|cz ; i- We also have Q = [a (E>I]P[b <g>I], hence are in the situation of 

Equation [TT1 Lemmal2Tl allows us to reduce to the following cases. 

Case I. a,b are diagonal, or aX^\bX^ are diagonal. In either case, Corollary [9] applied to 

the circuits defining a,b shows that e,f,g,h are each diagonal or anti-diagonal. 

Case II. Q takes the form 




c 



V 



C 



The cosine-sine decomposition (see Equation [2]) of V along qubit 2 determines unitary 
operators R,S and a real diagonal operator 8 such that: 



2 — I — I— 



V 



5 



(16) 



We substitute, commute the S, T outwards past C,C', and decompose the diagonals C,C'. 
1 



R, 



-9- 



« 7 



-9- 



-9- 



-9- 
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Evidently 3^(<2) depends only on 0,5,0. We calculate that, up to a global scalar multi- 
ple, 3^(<2) consists of the roots of the following quadratics in T: 

T 2 - 2r(cos(20 + 20) cos(<5,) 2 + cos(20 - 20) sin(<5,-) 2 ) + 1 

The equations being real, each has complex conjugate roots. By Theorem 1X81 |M|cz;i = 
|£?|cz;i =2, contrary to hypothesis. 

Case III. We have already ensured that P, rather than PyS- 2 \ commutes with Z^ 2 \ We 
replace a, b by the a',b' of Lemma |2T1 We demultiplex P (see Equation [3]) to obtain a 
decomposition of the following form, where D is diagonal. 






D 




— s — 




— R — 



The operators S,R commute past a',b' to the edges of the circuit, and thus do not affect 
theCZ-cost of Q. That is, \Q\ cz . t = \ [a' ®I\D[b' ®I]\ cz -j, 

By construction, \P\ cz - t = \D\ cz - e = 1. If D= |0) (0| (£) <g>D + |1) (l| W (g>Di, Theorem 



]asserts the entries of DqD[ are e { 1 , — 1 , 1 , — 1 , . . . }. Thus D can be written as 



D 



-R z (-e/2) 



K 



7T 



A) 



for some permutation %. We set N := X^Qa' ®I]D[b' ®I])W)[a! ®I]D[b' <g>/], so that 
3 (1) ([a'(g>/]D[6'<g>/]) is given by the entries of (0| (1) N |0) (1) . Evidently D commutes past 
a 1 and cancels with Dq. Applying Equation [T] to eliminate X gates, the following circuit 
computes N. 



-I 



R z (-e/2) 



K 



K 1 



5 



K 



R z (-e/2) 



(by i— 9- 



W 1 



5 



The condition on a' implies that (a'^X^a'X^ is diagonal. It follows that the subcircuit 
sandwiched between the two CZs computes a diagonal operator, and so the CZs cancel. 
Then the 7t, 1$ pair on the left cancel. The n^Z^n term on the right commutes past the 
(b'y. What remains is a circuit of the form 



1 — f 

2 - 



K 



By construction, N commutes with both Z^ and Z^ 2 \ It follows that F is diagonal. Then 



|(D 



N\0) 



(i) 



f = (0\^ 'F\0p ' is some one-qubit diagonal acting on m. We have 
tfZ®nf®. Denote by f ,fi the entries of /. Then the entries of (0| (1) N |0) (1) are 
/0)/b — /o, — fu an d moreover fo will occur with the same multiplicity as —f\\ likewise 
—fa will occur with the same multiplicity as/i. We see that a/— fa/ f\Z^ l \[a' ®I]D[b' 
come in conjugate pairs. By Theorem [TH \ {a' ® I]D{b' ® I]\cz-,i < 2. But now |M|cz;i = 
\Q\cz\\ = | W <S> I]D[b'®I] | cz;i> contrary to hypothesis. ■ 
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4.3 Corollaries 



The PERES gate implements a three-qubit transformation from classical reversible logic 
PERES^ ;m; ") = C^xM • CC^xW As shown in [12], it can be a useful alternative to the 
TOFFOLI gate in reversible circuits. 

Corollary 23 | PERES | cz = 5. 

Proof. As is clear from its definition, the PERES gate can be implemented by the circuit 
of Figured! save the rightmost CNOT. Thus, |PERES| C z < 5. On the other hand, it also 
follows from the definition that any circuit for the PERES can, with the addition of a single 
CNOT, become a circuit for the TOFFOLI. Thus |PERES| CZ > |T0FF0Ll| cz -1=5, and all 
inequalities are equalities. ■ 

In a different direction, we consider below multiply-controlled Z gates: 
Corollary 24 \(n — 1) — controlled — Z[ C z > 2nfor any n>3. 

Proof. We proceed by induction on n. Suppose the Corollary is false; choose minimal 
falsifying n, and a falsifying circuit ^ . By Theorem [151 n > 2. As before, at least three 
CZ gates are incident to each qubit, and counting shows that at least one, say £ touches 
exactly three. As before, we can assume that all one-qubit operators which appear on t 
are diagonal. Form the circuit = ( 1 1 ^ | 1 } ^ by replacing every gate g of ( <? with 
g' = (1\ g\l) ■ This has no effect on gates which do not touch I; it turns one-qubit gates 
on i into scalars, and replaces CZ^ with Z^. At any rate, c €' is a CZ-circuit on (n — 1) 
qubits which computes the (n — 2) -controlled- Z. We deduce by induction that it contains 
at least 2(n — 1) CZ gates. Adding the (at least) three CZs incident to I, there are at least 
2n + 1 total CZs in % '. ■ 



5 Three-qubit diagonal operators 

We give here a complete classification of three-qubit diagonal operators by their CZ-cost. 
Throughout this section, we assume no ancillae are available and label our qubits 1, 2, 3, 
from most significant to least significant. We abbreviate (iy 1 ' (k\® D [ip 1 ' \j)'" 2 ' 1 \k)^ 
by Djjk. We also write A(t]) for the one-qubit gate given by |0) (0| + |1) (1| 7]. Define 

Ai(D) = AmDooO ; A2(D) = ^iA™ A3(d) = £iJoA™ m= gmgjoo 

A)oiA)io ^iooAjoi £>iooA)io £>iooA)ioA)oi 

Then any three-qubit diagonal D admits the expansion 

D = D 000 -A f^Y^A f^Y^A 

V ^ooo / V ^ooo / V ^ooo / 

The Xi(D) are multiplicative, A ; (DD') = Xi(D)Xi{D'), and likewise for We denote by 
S(D) the ordered quadruple (Ai(D),A 2 (D),A 3 (D),|(D)). 

Observation 25 For D,D' three-qubit diagonal operators, S(D) = S(D') iffS{D^D') = 
(1,1,1,1) iff D^D' is a tensor product of one-qubit diagonal operators. It follows that 
S(D)=S(D') \D\ CZ[i = \D'\ cz .i. 
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Observation 26 3«(D) = {1, A / (0) t ,A*(l>) 1 ', £(D)%(D)} where {i,j,k} = {1,2,3}. 

Lemma 27 A three-qubit diagonal D can be implemented in a three-qubit CZ-circuit with: 

• CZs on touching qubit 1 iffS(D) = , 1, 1;£) 

• 1 CZ touching qubit 1 iffS(D) = (£,— 1,-1, ;£),(— £,1,— 1; t;) , (-xi, — 1,1 §). 

• 2 CZs touching qubit 1 iffS(D) = (a,b,c;abc),(a,b,c;ab/c),(a,b,c;ac/b). 

Proof. This is just a translation of Theorem [18] using Observation [26j involving a straight- 
forward but tedious calculation which we omit. ■ 

The two possibilities S(D) = (a,b,c;abc),(a,b,c;ab/c) are quite different, and the 
following result helps distinguish between them. 

Lemma 28 Let D be a three-qubit diagonal operator and u be a one-qubit gate. Suppose 
\Du^CZ^ | CZ ;i = 1 or |Cz( 1 - 3 )i < ( 3 )D|cz;i = 1. Then X l (D)X 2 (D) = X 3 (D)£,(D). 

Proof. The conclusion being stable under D — > D\ we assume |Di/ 3 )cz( 1,3 )| C z;i = 1- De- 
compose u ' = e w R z (a)R y (p)R z (r). Then 3^(A) is given by the roots of the polynomials 

x 2 - cos (2j3 ) ( 1 - A 2 (D) )x - A 2 (D) 

x 2 -cos(2^)(A 3 (D)-^/(D)A 1 (D))x-A 3 (D)^(D)/A 1 (D) 

For these to have roots either {p,p, —p, —p} or {p,p, p,p}, the two equations must have 
the same constant terms - either both p 2 or both — p 2 . ■ 

We turn to computing CZ-costs. These being invariant under relabelling of qubits, we 
write s(D) for {X\ (D) , A 2 (D) , A3 (D) ; t, (D)), where we ignore the order of the A,-. 

Observation 29 Given two three-qubit diagonals D,D', s(D) = s(D') if and only if there 
exist one-qubit diagonals d, d',d" and a wire permutation CO such that D = (d®d' ® d") ■ 
coDco\ Thus s(D) = s{D') => \D\ CZ = \D'\ CZ . 

Theorem 30 Let D be a three-qubit diagonal operator. Then there exists a CZ-circuit for 
D containing 

• 0CZsiffs{D) = (1,1,1; 1). 

• 1 CZ iff s(D) = {\,\-\; -I). 

• 2CZsiffs{D) = {\,\,^),{\-\-\;\). 

. 3CZsijfs(D) = (l,l^^),^-l-l^),(-^l-l^). 

• 4CZsiffs(D) = (a,b,c;ab/c). 

• 5 CZs iff s{D) = (a,b,c;ab/c), (a,b,c\abc) 

• 6 CZs always 
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Proof. We assume without loss of generality that D takes the form diag ( 1 , 1 , 1 , X\ , 1 , A2 , A3 , £ ) . 
We number the qubits 1,2,3 from most to least significant. 

(<=). We can assume that in fact S(D) takes the form given. Our constructions will use 
the CX, which may be replaced by the CZ at the cost of inserting HADAMARD gates. 

Case 0. 5(D) = (1,1,1; 1) =^D = /. 

Case 1. S(D) = (1,1,— 1; — 1) D = Cz( l > 2 \ 

Case 2a. S(D) = (£, 1, Fix 77 = 

1 

2 



D 



A(7]) 



A(7]) 



A(l/7]) 



Case 2b. 5(D) = (1,-1,-1; 1) D = CZ^Czt 1 - 2 ). 

Case 3a. S(D) = (£,1,1;£). By Case 2a, the CZ can be implemented in a circuit 
containing 2 CZs. It follows that any operator that can be implemented with n > CZs 
can be implemented with n + 1. Thus since D can be implemented with 2 CZs, it can be 
implemented with 3. 

Case 3b. S{D) = (£,-!,-!;£). Fix 7] = y^; 



1 

2 



D 



A(TJ) 



A(TJ) 



A(1/TJ) 



-e- 



Case 3c. S(D) = Fix 77 

1 - 



2 
3 



D 



A(TJ) 



A(TJ) 



A(1/TJ) 



-e- 



Case 4. S(D) = (a,b,c;ab/c). Fix square roots a, j3, y for a,b,c; 



1 - 

2 — I D — - 

3 



A(/3) 



A(a) 



- A(aj8/y) 



-e- 



A(y/a) 



-e- 



A(l/y) 



-e- 



A(y//3) 



-e- 



Case 5a. S(D) = (a,b,c;ab/c). As D can be implemented with 4 CZs, it can be imple- 
mented with 5. 

Case 5b. S(D) = (a,b,c;abc). Fix square roots a,/3,yfor a,b,c; 



1 - 

2 —ID 

3 



A(/3y) 
















A(ay) 


•\ 


A(l/y) 




— € 










5— 


— € 
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Case 6. More generally, any n-qubit diagonal operator has CZ-cost bounded by 2" — 2. 
See or Section [23] 

(=>)■ 

Case 0. D must be locally equivalent to /, hence s(D) = (1, 1, 1; 1). 

Case 1. D must be locally equivalent to some CZ, hence s(D) = (1,1, — 1;— 1) 

Case 2 Suppose there exists a minimal implementation of D in which both CZ gates 
connect the same two qubits. Then D is locally equivalent to a two-qubit diagonal; in 
which case one can compute s(D) = (£,1,1;£) 

Otherwise, there is a minimal implementation of D in which the two CZ gates are CZ^'fi , 
CZ"'*) . By Corollary \13\ we may pass to an implementation with only diagonal one-qubit 
gates along j; by Corollary [TOl we may pass to an implementation with only diagonal one- 
qubit gates along i,k as well. But then D is locally equivalent to CZ^CZ^'*) and we may 
compute s(D) = (1,-1,-1; 1). 

Case 3. It suffices to show that \D\ C z-j < 1 for some j. For, if \D\ cz -j = 0, then D 
is a two-qubit diagonal, with s(D) = (£,1,1;£), and if |Z>|czv' = 1, then by Lemma l27l 
*(D) = (-§,l,-l;$)or(§,-l 

Consider an implementation of D containing three CZs. We have |D| C z^ < 1 for some 
£ unless the CZs are distributed so that each qubit touches exactly two. Let j be a qubit 
touching the middle CZ. By Corollary [T3l we can assume the circuit contains only diagonal 
gates on qubit j; it follows by inspection that D CZ^^CZ^. But we have already 
determined that |CZ(''>^CZ^) \ C z-j = 1- 

Case 4. Consider an implementation of D containing four CZs. If any qubit touches 
fewer than two CZs, we reduce to the previous case and observe that the desired condition 
on s holds. Thus suppose each qubit touches at least two CZs. Then there are only two 
possibilities for the number of CZs touched by each qubit: (2,2,4) and (2,3,3). 

For the configuration (2,2,4), say qubits £,m touch two CZs and qubit n touches four. 
Note that no CZs connect £,m. Thus we may assume by Corollary [T3l all one-qubit gates 
on £,m are diagonal. By Proposition \5\ det( m D is separable; this says precisely that 
X i {D)^m = X n {D)B,{D). 

For the configuration (2,3,3), say qubit 1 touches two CZs and qubits 2,3 touch three. 
Then there are two CZs connecting qubits 2 and 3, one connecting qubits 1 and 3 and one 
connecting qubits 1 and 2. By Corollary [T3l we ensure that all one-qubit gates on qubit 1 
are diagonal. If the CZs connecting qubits 2 and 3 are outermost, D ~^ CZ^'^CZ^ 1 ' 3 ', hence 
can be implemented with three CZ s by Case 3. Otherwise, one of the CZs incident on qubit 
1 is outermost; without loss of generality let it be CZ^ 1,3 ). Then we have an equation of 
the form D = w^CZf 1 - 3 ^ where by construction A commutes with ZW and |A| C z ; i = 1. 
Lemma [28] yields the desired result. 

Case 5. It suffices by Lemmal27lto show that \D\ cz -e < 2 for some t. Suppose not; then 
in any five-CZ implementation for D, each qubit must touch three CZs. It follows that two 
of the qubits, say £,m touch exactly three CZs, and the remaining qubit touches four. By 
Theorem|22j all one-qubit gates on £,m are diagonal or anti-diagonal. Enough applications 
of Equation Q] will ensure that all one-qubit gates on £,m are in fact diagonal. Move the CZ 
which connects £,m to the edge of the circuit. This yields D = CZ^- m ^A, where |A| C z ; f < 2. 
By Lemma [27] it follows that |D| C z ; f < 2 as well. ■ 
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6 Circuits with ancillae 



The proofs of Theorems [15] and [30] assume that only three qubits were present, and use this 
assumption when enumerating possible circuit configurations with a given total number of 
CZ gates. This dependency can be eliminated. Indeed, these cases involved so few CZs that 
one could eliminate configurations with ancillae by performing explicit checks. 

More significant is the use of Proposition [5] and the characterization by Theorem [T8l of 
|-0|cz^ < 2. Both of these statements are true for any fixed N, but suffer when N is allowed 
to vary. For example if only N = 3 qubits are available, then deti^ CCZ' 1 ' 2 ' 3 ) = CZ^ 1 ' 2 ), so 
by Proposition [5] the CCZ cannot be implemented in any three-qubit CZ-circuit in which all 
gates commute with zW,z( 2 ). But if N = 4 qubits are present, deti^CCZ^ 1 ' 2 ' 3 )) =/ (1 ' 2) , so 
CCZ' 1 ' 2 ' 3 ) (g}/' 4 ) can be implemented in a four-qubit CZ-circuit in which all one-qubit gates 
commute with Z^ and Z^ 2 ). 

Similarly, for N = 3 qubits, we have 3 M (CCZ) = { 1 , 1 , 1 , - 1 } and thus by Theorem \M 
I CCZ I cz-a > 3 - However, for N = 4 qubits, 3 e (CCZ^ m ^ ) = { 1 , 1 , 1 , - 1 , 1 , 1 , 1 , - 1 } , so now 
Theorem [H implies that jCCZ^ 1 ' 2 ' 3 ' <g>/ (4) | C z ; i = 2. Indeed: 
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On the other hand, the properties 3 W ( U ) = { 1 , 1 , . . . } and 3 W (17 ) = { 1 , - 1 , 1 , - 1 . . . } 
are stable under adding ancillae. By Theorem [18] so are the properties |t/|cztf = an d 
|£7|cz;£ = 1- Since only these properties are used in the proof of Lemma [28] it too holds 
even in the presence of ancillae. This leads to an extension of the CZ-cost classification of 
three-qubit diagonals to the case where ancilla qubits are permitted. 



Lemma 31 Let Abe a unitary operator; let be qubit minimal among CZ-circuits com- 
puting A, possibly with the use of ancillae, using only \A\q Z CZ gates. Then every ancilla 
in < jf touches at least three CZ gates. 

Proof. Fix an ancilla qubit I. If no CZ gates touch I, then it may be removed. If one 
(respectively two) CZ touches £, then by Corollary [10] (respectively Corollary [T3] >. then 
there is a circuit with no more CZs in which the only one-qubit gates on a are diagonal. 

Now form the circuit (0|^ ^ |0)^ as in the proof of Corollary [24] This circuit com- 
putes the operator A using one fewer ancilla, fewer CZs than % '. ■ 



Corollary 32 For any two-qubit operator V, \V\q Z = \V\cz- 

Proof. If no ancillae are needed to minimize CZ-count, then the result holds. Otherwise, 
each ancilla used in a qubit-minimal CZ-minimal implementation must touch at least three 
CZgates. Thus | • | cz > | • |g z > 3. However it is known l23"]l22l[T6ll that two-qubit operators 
have | • [ cz < 3. Thus all the inequalities are equalities. ■ 



Proposition 33 For any three-qubit diagonal operator, D, \D\^ Z = \D\ CZ . 
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Proof. Suppose \D\^ Z < |Z)[ C z- By Lemma |3~T1 a qubit-minimal circuit for D achieving the 
bound for |Z)|g z contains at least three CZ gates incident on each ancilla. By assumption at 
least one ancilla is used, so |D| C z > \D\cz > 3. It follows from Theorem l30l and Lemma 
[27] that |D| CZ; £ > 1 for the three qubits I = 1,2,3. By Theorem [T8l this property is stable 
under addition of ancilla. Thus a qubit-minimal circuit for D achieving the bound for |Z)|g z 
contains at least 3 CZs incident to each ancilla, and at least 2 CZs incident to each non- 
ancilla qubit. If k ancillae are used, then we have \D\^ Z > (3& + 6)/2. From Theorem |30l 
and the supposition we have |Z>|g z < |D| C z = 6; it follows that k = 1, that |Z)|g z = 5, and 
that |D| CZ = 6. 

In any four-qubit, five-CZ circuit for D, we must have two of the non-ancilla, say x\ ,x 2 
touching two CZs, and both the remaining non-ancilla z and the ancilla a touching three. 
By Corollary \\3\ we can assume that the only one-qubit operators appearing on x\ , x 2 are 
diagonal. We may also assume that the graph where vertices are qubits and edges are CZ 
gates is connected; otherwise D could be split into the tensor product of a two-qubit and a 
one-qubit diagonal, and hence would have \D\ < 2. Then there are only three possibilities 
regarding which wires are connected by CZs. 
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We will show that any circuit with those CZ gates can be transformed so that (*) a CZ 
which does not touch the ancilla is outermost among the CZs, and (**) one of the x-qubits 
on which this CZ gate acts has the property that all one-qubit gates acting on it are diagonal. 
As this x-qubit only touched 2 CZ gates to begin with, it follows from Lemma |28] that s(D) 
takes the form (a,b,c;ab/c). By Theorem [30l |0|cz = 4, which is a contradiction. 

We return to checking (*) and (**). Eliminate non-diagonal one-qubit gates on x, using 
Corollary [13] In Case (I), the (xi ,x 2 ) CZ can therefore only be prevented from moving by 
the (x\,a). This can be on only one side, so the (xi,X2) can be moved outwards to the 
other. Similarly, in Case (II), an (x,z) can only be blocked by (z,a) and the other (x,z). In 
this case, the second (x,z) is blocked on only one side and can be moved to the edge. In 
Case (III), we use Corollary [13] to clear both the x\ and x 2 qubits of non-diagonal gates; 
the possible additional one-qubit gates will only fall on the z and a qubits. Now the (xi,z) 
can only be blocked by the (x 2 ,z) and the (z,a), and also the (x 2 ,z) can only be blocked by 
(z,a) and (x\,z). Thus one of (xi,z) and (x 2 ,z) can be made outermost. ■ 

Corollary 34 |CCZ|g z = |T0FF0Ll|g z = 6 and \PERES\% Z = 5. 

7 Conclusion 

While our work is primarily focused on quantum circuit implementations, the TOFFOLI 
gate originally arose as a universal gate for classical reversible logic [21 J. In contrast, 
the NOT and CNDT gates are not universal for reversible logic: their action on bit-strings 
is affine-linear over over F2, and thus the same is true for any operator computed by any 
circuit containing only these gates. 

Augmenting CNDT gates with single-qubit rotations to express the TDFFDLI gate pro- 
vides the lacking non-linearity. Thus the number of one-qubit gates (excluding inverters) 
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needed to express the TDFFDLI, or more generally any reversible computation, can be 
thought of as a measure of its non-linearity. In this inverted cost model (also relevant to 
some quantum implementation technologies) the following question remains open: how 
many one-qubit gates are needed to implement the TOFFOLI? Furthermore, are there cir- 
cuits that simultaneously minimize the number o/CNOT and one-qubit gates ? 

In a different direction, recall our results showing that diagonality and block-diagonality 
of an operator impose strong constraints on small circuits that compute this operator. We 
believe other conditions may act in a similar way. In particular, we ask what can be said 
about minimal quantum circuits for operators computable by classical reversible circuits, 
i.e., operators expressed by 0-1 matrices? Very little is known even for three-qubit opera- 
tors. In particular, the CNOT-cost of the controlled-swap (Fredkin gate) remains unresolved. 

Closest to our present work, the exact CNOT-cost of the rc-qubit analogue of the TOFFOLI 
gate remains unknown. We have shown that 2n CNOTs are necessary if ancillae are not per- 
mitted, but already for n = 4 we only know that 8 < |CCCZ| C z < 14, where the upper bound 
is provided by a generic decomposition of diagonal operators [3 ]. Existing constructions 
of the «-qubit TOFFOLI gate require a quadratic number of CNOT gates without the use of 
ancillae. With one ancilla, such constructions require linearly many CNOTs, but the leading 
coefficient is in double-digits lfTl[l2"Tl. 

Finally, we hope that our proof can be simplified and our techniques generalized. In 
particular, we have relied on repeated comparisons of various Cartan decompositions to 
each other. A careful study of the proof will reveal the simultaneous use of six Cartan 
decompositions — those corresponding to conjugation by X and Z on each of three wires. 
Keeping track of these decompositions in a more systematic manner may simplify the 
proof, while using additional decompositions may lead to new results. A related challenge 
is gauging the power of the qubit-by-qubit gate counting we have used. It follows from the 
results of [18] that |f/|cz^ < 6(n — 1) for U an «-qubit operator, and hence no technique 
relying solely on this process can achieve better than a quadratic lower bound. On the 
other hand, we have only been able to characterize cases when |£/|cz;^ > 2, and thus have 
achieved only linear lower bounds. 
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Appendix: Proof of Proposition H 



Below we restate Proposition [5] and complete its proof. 

Proposition 5 Fix qubits among N > k qubits. A unitary U commuting with 

Z^ 1 ', . . . , Z^*) can be implemented by a CZ-circuit in which only diagonal gates operate on 
qubits £i if and only ifdetg 1 „j k (U) is separable (can be implemented by one-qubit gates). 

Proof. (=>). It suffices to show the separability of defy, ...e k (U) for a small generating set of 
operators. Direct calculation confirms this for (i) CZ gates, (ii) diagonal one-qubit gates on 
the if, and (iii) any gate not affecting qubits 

(<=). By hypothesis, det^...4(f/), and hence & = d&t£ lM ..i k (U) , can be imple- 
mented using only one-qubit diagonal gates. It remains to implement U = U /S>, which 
satisfies the normalization Uj l „j k € S\J(2 N ~ k ). We will construct a circuit for U by mul- 
tiplexing circuits for Uh...h- Let ^ be a (N — &)-qubit circuit containing only CZs and 
one-qubit R x ,R y ,R z gates such that any operator in S\J(2 N ~ k ) can be implemented by mak- 
ing the appropriate choice of parameter for the R x ,R y ,R z gates. Such universal circuits 
exist JT); see Section [231 for modern constructions. Choose specifications c €j x ...j k imple- 
menting the fjj i ...j k \ let the s-th rotation gate in c ioj x ...j k be given by RdU)(0ji...j k ( s )) > 
where q(s) is a qubit, 6j 1 ...j k {s) is an angle, and d(s) = x,y,z. Define 0(s) to be the real 
diagonal operator on qubits 4 such that ®(s)j h ..j k = 9/, ...j k (s). Form the Af-qubit cir- 
cuit by replacing the s-th rotation gate of ^ by the multiplexed rotation R d / s \(Q(s))^ s ^ ; 
then ^implements U. Implement Rm s >(Q(s))^ s '' by a CZ-circuit containing no one-qubit 
operator on any qubit save q(s), which is not one of the (see [13 ] or Section [23T >. ■ 

Corollary 35 N-qubit operators which commute with Z on k qubits can be implemented 
using on the order of 2 k 4 N ~ k one-qubit and CZ gafesjf] 

Proof. This follows from the construction in the proof of Proposition [5] and the known 
estimates in the cases k = 0,N — 1 lPT3l and k = N (3). ■ 



5 Dimension-counting following [9| shows that roughly this many are necessary for almost all such operators. 
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